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ABSTRACT 

This paper presents a coaprehensive review of special 
and remedial education practice in foraative evaluation of goal 
attainnent aaong aildly handicapped students. First, the relationship 
of foraative evaluation to the range of special education assessaent 
practice and its iaportance to the field are described. Then, four 
critical issues in foraative evaluation aethodology are discussed: 
focus of aeasureaent, frequency of aeasuroMint, data display, and 
data-utilizacion aethods. The reviewed research supports the use of 
ongoing criterion-referenced aonitoring systems to improve the 
instructional programs of mildly handicapped students. Research also 
provides the basis for several statMwnts on the nature of effective 
monitoring systems among mildly handicapped populations. 
Specifically, the data base supports the use of (1) certain 
empirically validated critical behaviors as the focus of monitoring 
activity; (2) long-term goal statements that may encourage teachers 
to focus not only on the immediate instructional content, but also on 
maintenance and generalization of skills; and (3) relatively 
ambitious goals that support task persistence and striving. The 
literature also provides a rationale for teachers to measure student 
performance at least twice weekly; to graph data using their 
preferred convention; and to employ data-utilization lules for 
determining when, and perhaps how, to modify students' instructional 
programs. An eight-page reference list is appended. (PN) 
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Abstract 

The purpose oi paper wat to review special education practice in 

'formative evaluation O'f goal attainment among mildly handicapped 
pupils. Firstf t^e r^tutionship of formative evaluation to the range 
of special education letiment practice and its importance to the 
field are described. Then, four critical issues in formative 
evaluation methodology are discussed: focus of measurement, frequency 
of measurentent , data display, and data-utilization methods. Finally, a 
proposal IS advanced for additional related research. 



ERLC 



1 



• t 



R»vifw O'f Monitoring Pructdurtt with Hildly Handicapptd Studtntt 



Thf tpfCtficMion of eo»l» and tht fv»lu»tion of 90«1 ttUiniifnt »rt 
fund«mfnt»1 to Awtric.n ichooling. Hi »tor ic.l Ix, primary goaU of public 
fducktion hagf been those of Americanization and initilling democratic 
values in youth (Mulhern, 1959>. and evaluation has been conducted in a 
routinely clerical way through accumulation of enrollment and attendance 
records (Campbell. Cunningham, Nystrand, 4 Usdan, 1975). Over time, 
however, notions concerning the nature of us-ful educational goals and what 
acti^/ities constitute effective evaluation have changed dramatically. The 
nature of this evolution is reviewed briefly below. 

In the nineteenth century, educational goals and evaluation wrre 
broad and related marginally to academic curriculum. Psychologists viewed 
the brain as a composite of general intellectual activity (Eisner. 1967), 
and. relitedly, prevalent beliefs held that identifying and strenghthening 
general f»cul t ies woul d produce concurrent educational growth. This 
conceptualization of learning fostered the global definition and evaluation 

of educational outcomes. 

At the turn of the century, however, notable movement towa-d 
specificity in goal definition and evaluation occurred as psychologists 
began to develop the notion tnat educational growth might be operational ized 
,n term, of ser.es of specific learning products. This notion was based on 
Thornd.ke's work, which demonstrated the specificity of transfer wherein 
generalization of learning occurs when elements in the original learning 
context art relevant and similar to elements in other contexts (Eisner, 
1967). Applying Thornd.ke's work to the development of educational 
curricula. Bobbitt (cited in Eisner, 1967) argued thii life consists of the 



ERIC 



MoRitorinQ Pro9rtss-4 
performanct sptci^ic activitits and that tht numtrous and discrttt skills 
and knoMlfdgt rtquirtd for succtssful adulthood should constitute tht 
curriculum oi schools. 

This prtmisf was central to Ralph Txl^r^'s work in curriculum and 
evaluation. As director o4 the National Assessment Project, he reqjired 
that specific ediicational objectives constitute the basis 4or designing 
evaluation instruments. Under the auspices o4 the National Assessment o^ 
Educational Progress, there was sustained effort to specify distinct 
academic goals and to develop an array of related psychometrical 1y adequate, 
group achievement test^^. Standardized administration of these tes^s to 
large groups of students created a national data base by which summative 
comparisons of educational effectiveness could be formulated among regions 
of the country, school systems, schools, and pupils of varying 
characteristics. This development represented a critical move toward more 
direct, standard ized measurement of group academic achievement as a 
summative index of pro^^am effectiveness and goal attainment. 

Psychologists like Gagne, Glaser, and liager also were interested in 
developing clear statements of educational objectives and achievement tests 
based on those objectives (Bloom, Madaus, & Hastings, 1971). Compared to 
Tyler, however, these researchers focused more on development of effective 
instruction than on summativ ^aluaiion. They proposed that, in order to 
increase educational effectiveness, educators should measure pupil 
achievement over time, in relation to a specific set of desired outcom? 
behaviors. This idea led to the development of a methodology of 
criterion-referenced formative evaluation. 

Criterion-referenced formative evaluation is the ongoing collection 
of pupil performance datA, during program implementation, and with respect 
to mastery of behaviorally-stated goals. The purpose of such data 
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collection is tc generate an information base that informt decisions 
concerning how to revise and iraprove programs. Consequently, formatiMe 
evaluation methodology addresses strategies ior (a) specifying clear, 
difvtinct instructional objectives or domains (e.g., Bloom et a1», 1971; 
Popham, 1980), and (b) developing technically adequate criterioti-referenred 
measurement procedures that can be matched iiomorphical 1y to instructional 
objectives or domains (e.g., Popham, 1980). 

Formative evaluation for revision and improvement of instructional 
programs perhaos has realized its greatest Impact in special and remedial 
education programs, where conventional instructional practice^ by 
definition, is ineffective. The literature on the effectiveness o4 
formative evaluation, or continuous monitor ing of individual pupil progress 
with revision of educational programs, is robusts The Direct Instruction 
(e.g., Gersten, Carnine, & Uhite, 1984), applied behavior analysis (e.g., 
Lovitt, 1981; Rieth, Polsgrove, & Senroel , 1981), general special education 
(e.g., Fuchs, Deno, & Mirkin, 1984), effective schools (Eubanks & Levine, 
1983; Hoffman & Rutherford, 1984) and special eduation effective teaching 
(e.g., Goodman, 1985; Peterson, Albert, Foxworth, Cox, ttTilley, 1985) 
literatures all include ongoing objective-referenced monitoring of pupil 
progress as an essential component of effective teaching. 

Nevertheless, a comprehensive review of special and remedial 
eoucation practice in formative evaluation of goals is lacking; and without 
such integration of available empirical work, it remains unclear how 
pract I t 1 onors can implement formative evaluation most effectively. 
Therefore, the purpose of this paper was to conduct such a review. This 
integration is organized into three major sections. In the first part« the 
relationship of formative evaluation tj the spectrum of special education 
assessment practice and its importance to the field are described. Then, 
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Monitoring Progrffts*6 
four critical iftutt in formative tvaluation mtthodologx art discussed, with 
a review O'f recent research n each area. These critical issues concern the 
■form and frequency of assessment and methods of data display and df ta 
ut i 1 izat ion. Finally, a proposal is presented for future research. 

Importance of Monitoring as a 9ptcia1 Education Asstsspient Activity 
According to Salvia and Ysseldyke (1985), special education 
assessment is the process of collecting data in order to specify and verify 
problems and t-^ formulate decisions about pupils with respect to referral, 
screening, classification, instructional planning, and program modi f i cat ion. j 
Decisions in the first three assessment phases constitute the identification 
process, wherein norm*-referenced comparisons among pupils are made to Judge 
whether sti;dents are sufficiently discrepant from peers to require special 
intervention. 

Decisions in the remaining assessment phases, instructional planning 
and program modification, are related more integrally to instructional 
program content and methodology. In the first of these two phases, specific 
problems are identified for educational intervention, student 
characteristics are described, the educational ecology is assessed, and an 
initial hypothesis concerning instruction is generated* In the second 
phase; the effectiveness of the instructional hypothesis is evaluated 
through ongoing measurement of pupil progress, and the cycle of postulating 
and testing instructional hypotheses continues. 

Theoretically, the instructional planning and program modif i cat i on 
phases of assessment complement one another. Nevertheless, in practice, 
they have become assoc i ated wi th markedly different and, for the most part, 
mutually exclusive approaches to the developmeni of special education 
instructional programs. The first approach, aptitude-treatment interaction 
(ATI), embodies the instructional planning phase: the initial description of 
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Itarntrs, whertin student aptitudes are presumed to interact predictably 
with instructional treatments to produce comparatively strong student 
learning. Thus, with an ATI approach, the development or seltctiun o4 
educational programs is deductive, derived ^rom prior explication of learner 
characteristics. 

The second approach is formative evaluation or ongoing 
objective-referenced monitoring of pupil progress. Uhereas an ATI approach 
emphasizes the importance of the first phase of program planning, describing 
salient learner characteristics, formative evaluation tmbodies program 
planning^s second major phase, ongoing evaluation and modification of 
proposed programs, wherein student performanct is measured repeatedly under 
different instructional conditions. The purpose of this mtasurement is to 
provile a data base with which effective instructional programs may be 
developed empirically. Thus, formative evaluation is an inductive, rather 
than deductive, approach to developing instructional |)1«ns. 

An ATI approach represents the traditional and prevalent method for 
formulating educational programs (e.g., Ysseldyke & Thurlow, 1984). 
Nevertheless, important problems have been assoc i ated wi th ATI-related 
methodology, and formative evaluation appears more tenable for several 
related reasons. First, formative evaluation avoids ATI^s reliance on 
initial diagnoses of learner characteristics for prescribing treatment when 
(a) conceptualizations of cognitive abilities are incomplete (Ysseldyke, 
1979), (b) ?.vai1ab1e tests of pupil characteristics are psychometr ical 1y 
inadequate (Arter & Jenkins, 1979; Coles, 1978; Glaser, 1981; Salvia & 
Ysseldyke, 1985), and (c) the nature of interactions among learner and 
teacher characteristics, educational treatments, and classroom environments, 
to a large extent, is unknown (Ysseldyke, 1979). Second, the formative 
evaluation related practice of repeated measurement by classroom teachers in 
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Monitoring Progrtss-S 
classroom sittings is mort ecological!/ valid and Itss rtactiMt than tht use 
oi traditional assessment procedures associated with typical ATI approaches 
(D. Fuchs & L. Fuchff, in press a). Furthermore, it avoids traditional 
testing procedures assoc iated wi th ATI methodology (e.g., one test session 
administered by an unfamiliar examiner), which may discriminate 
systematically against handicapped, minority, and low socioeconomic students 
(D. Fuchs L L. Fuchs, in press b; Fuchs, Fuchs, Power, & Dailey^ 1985). 

Perhaps most importantly, however, research on the effectiveness of 
formative evaluation is more promising than the literature associated wi th 
ATI approaches. In a recent meta-analysis of the effects on student 
achievement of formative evaluation (L. Fuchs & D. Fuchs, in press), the 
average effect size was .70. This indicates one can expect students whose 
programs are monitored systematically and developed formatively over time to 
achieve, on average, seven-tenths of a standard deviation unit higher than 
students whose programs are not monitored systematically and developed 
formatively. In terms of the standard normal curve and an achievement test 
scale with a population mean of 100 and standard deviation of 15, the use of 
formative evaluation to generate and evaluate individv.al ized programs can be 
expected to raise the typical achievement outcome score from 100.00 to 
110.50, or from the 50th to the 76th percentile. 

The use of formative evaluation, then, appears to increase academic 
achievement reliably, and this program planning me.hodology is available for 
practitioners to inductively f'jrmulate instruct i onal ly-related assessment 
decisions and successful individual educational programs. This conclusion 
contrasts sharply with (a) the research base indicating that ATI approaches, 
specifically, fail to improve the achievement of handicapped learners <e.Q.» 
Arter & Jenkins, 1979; Hammill & Larsen, 1974; Hammill & Uiederholt, 1973), 
and (b) a body of literature that questions the effectiveness of special 
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education in gtntral (t.g.f Dunn, 1968; Glass, 1983). 

Constqutntlyy this alttrnativt mtthodology ior dtveloping studtnts"" 
instructional programs may rtprestnt a w. itical compontnt of special 
education assessment practice. Furthermore, its potential Importance is 
highlighted by and implicitly recognized in The Education for A*11 
Handicapped Children Act of 1975, which mandates that Individualized 
educational programs (lEPs) include evaluation procedures for assessing 
whether goals and objectifies an being achieved. Criterion-referenced 
formative evaluation appears consonant with this Federal mandate and with 
public demand, in general, for accountability in the schools. 

Although substantive compi iance wi th this Federal legislation 
suggests that special educators routinely engage in criterion-referenced 
measurement for evaluation of goal attainment (Deno & Mirkin, 1977; Fuchs tc 
Fuchs, 1984), research (Fuchs, Fuchs, ^Uarren, 1982) on the procedures 
special educators use to evaluate student mastery of lEP goals indicates 
that they formulate criterion-referenced decisions primarily on the basis of 
unsystematic observation, rather than on the basis of assessment data (Fuchs 
& Fuchs, 1984). Unfortunately, although teachers express confidence in the 
accuracy of those criterion-referenced judgments (Fuchs, Fuchs, & Warren. 
1982), their informal evaluations about objective mastery tend to be 
inaccurate and to overrate student performance (Fuchs & Fuchs, 1984). This 
suggests the need for practitioners to design and implement 
criterion-referenced measurement systems for formulating valid 
criterion-referenced decisions concerning student progress. Below critical 
issues for teachers to consider in designing these systems are discussed. 

DesioninQ Criterion-Referenced Formative Evaluation Systems: 

Critical Dimensions 
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Antlyftif oi the 1 i tt raturt on cr i terion-rtf trtnctd formatiMt 
fvaluation for mildly handicapDtd pupils rndicatts that certain tltrntntf oi 
ongoing monitoring ayattms may bt critical in tfftcting dttirtd atudtnt 
achitvtmtnt outcomta. As practitioners dtsign proccdurts for mt^suring 
studtnt progress toward goal attainment , four important tltmtnts art the 
focus of mtasurtrntnt, frtqutncy of mtasurtmtnt, data di splay i and 
data-utilization mtthods. A discussion of rtstarch rtlattd ^o tach 
dimension fol lows. 
Focus of Mtasurtretnt 

In designing measurement systems, a first consideration is ihe focus 
of measurement, which involves specifying measurement and instructional 
goals. Three relevant dimensions of goal statements are: Uhat simple, 
observable behaviors are critical indicators of student performance?} Ulhat 
is an appropriate breadth of goal statement?; and Uhet principles are useful 
for determining mastery criteria? 

Critical behaviors. A relatively new, but growing body of research 
concerns what critical behaviors can be used to monitor mildly handicapped 
students^ progi^ess in basic skills reliably, validly, and practically. This 
research provides pract i t ioners wi th information concerning what behaviors 
are critical indicators of student growth and, relatedly, what behaviors 
might be useful to incorporate into basic skills goal statements. 

Findings indicate that repeated 1 to 3 minute measurements of simple 
behaviors, such as reading .loud isolated words or passages (Deno, Mirkin, & 
Chiang, 1982)9 writing words in response to story starters (Deno, Marston, & 
Mirkin, 1982), and spel 1 ing words or letter sequences (Deno, Mirkin, Lowry, 
& Kuehnle, 1980), provide meaningful time series of academic performance. 
The measures demonstrate stability (Fuchs, Deno, & Marston, 1983), 
interscc^er and alternate form reliabilty (Marston, 1982; Marston & Deno, 
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1981), sc'^sitiMi^y to studtnt growth (Marston, Puchs, & Dtno, 198S), 
criterion validity with rtsptct to contract validattd, widtly acctpttd ttstt 
(Deno, tiarston, & llirkin, 1982} Dtno, Mirkin, k Chiang, 1982( Dtno tt a1 . , 
1980) and teachtr ratings (Marston ft a1., 1989| Puchs, 1981), 
discriminative valioity with respect to students^ special educMion label 
(Deno, Marston, & Mirkin, 1982; Deno, Mirkin, It Chiang, 1982{ Deno et a1., 
1980), and logistical feasitlllty for pra' Itioners (Puchs, Uesson, TIndal, 
Mirkin, & Deno, 1981). These findings indicate which behaviors might be 
incorporated into goal statements and can be used to track student progress 
over time by applied behavior analysts, precision teachers, and other 
designers oi monitoring systems. 

Breadth oi ooal statements. In comparison to general concensus in 
the literature concerning what behaviors are useful for monito^^ing, greater 
controversy surrounds the issue of breadth of goal statement toward which 
progress should assessed. Currently, practitioners can monitor progress 

I 

toward one of two types of goal statements, each representing a different 
breadth: One focuses on attainment of long-term goals, the other on mastery 
of short-term objectives. 

Uith a long-term goal approach^ an annual goal is specified and a 
large pool of rel ^ted measu^ ement items is created. Prom this measurement 
pool, subsets of items, or monitoring probes, are drawn randomly (see Fuchs, 
DenOy & Mirkin, 1984), and the difficulty level of the monitoring probes 
remains constant over the year. Contrastingly, with a short-term objective 
approach^ a series of objectives corresponding to steps within a 
hierarchical curriculum is specified, and a series of relatively 
circumscribed, small pools of items are created, each of which corresponds 
to a specific objective (see Lindsley, 1971) White it Haring, 1980). The 
difficulty level of material on which stud^n^s are measured increases as 
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Monitoring Pro9rt»»-12 
fttudtnts masttr tht stqutntially-rtUttd obJtctiMts. 

By dtfinition, brth typtf oi mtasurtnifnt art ongoing, 
cri ttr ion-rt f trt net d, curr iculum-bastd, and tnjoy strong curricular validity 
or correspondtnce bftMttn ttsts and prograimatic goals and c^bJ^ctiMts 
(McClung cittd in Yalow & Popham, 1983). HmtMtr, thtst tysttnt can bt 
characttr i7f d by important conctptual and ttc nical difftrtncts. A 
fhort-tc^rm obJtctiMt strattgy hat strongtr instructional valluity or 
correspondencf bttwftn ttsts and instruction (Yalow ft Popham, 1983). Tht 
monitoring probts for short-ttrm mtasurtrntnt art rtlattd dirtctly to currtnt 
instructional mattrial; so, for rxamplt, if an instructional inttrvtnUon is 
introduction of tht r-controlltd phonics rult, tht moni tor ing measure is 
reading r-controlUd words. Alternately, with a long-term goal approach, 
the monitoring probes are not related directly to the instructional 
material. The instruction?!^ intervention may be introduction of the 
r-controlled phonics rule, whereas the monitoring measure may involve oral 
reading fluency, accuracy, and/or comprehension on second grade passages. 

Although a fhort-term objective approach enjoys stronger 
instructional validity (Bowers, 1980), a long-term goal strategy possesses 
at least three advantages. First, it demonstrates better content validity 
or representation o^ the ultimate desired performance , i.e., readinn 
fluency/comprehension (see Fuchs & Fuchs, 1985a). Second and relatedly, its 
concurrent validity or correlation with other measures of achieutment is 
stro.iger than that of a short-term objective method (Fuchs, 1982). Finally, 
data analysis is ficilitated with a long-term goal approach: Teachers 
analyze student performance on material representing the sane level of 
difficulty across a long time period, so data analysis can occur across any 
contiguous portions of a graph. By contrast, trend lines cannot be applied 
across time within a short-term goal approach when objective mastery occurs 
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and tht mtaurtAtnt domain sinultantously shifts in difficulty. 

In addition to thtst conceptual and ttchnical difftrtncts, thf%f 
alttrnativf proctdurts havt difftrtnt practical implications for 
practitioners. Short-ttrm objtctivt mrasurtmtnt is tasitr for practitioners 
to understand and, thtrtfort, ttachtrs sttm to prtftr it for ccMmunicating 
progress to ftllow professionals and partnts (Fuchs, Utsson, Tindal, Mirkini 
& Deno, 1982). HoMfVfr„ it also rtquirts ttarhtrs to crtatt ntw monitoring 
measures often, as students master the hierarchy of object, s. This 
frequent change in measurement requires additional time commitments from 
teachers (Fuchs, Wesson, Tindal, Mirkin, & Deno, 1982). 

Therefore, important conceptual, technical, and practical 
differences are inherent in these two approachers to determining what to 
measure. Even more importantly, however, critical differences in student 
achievement outcomes are associated with these alternative monitoring 
procedures. In a r^rcent meta-analysis, Fuchs and Fuchs (1985a) found that 
whet, progress was measured toward long-term goals, ef .e% t sizes calculated 
on global achievement test dependent measures were kr\ average ,51 higher 
than on outcome measures that were sjmilar to the monitoring probes. On the 
other hand, when progress was measured toward a series of short-term goals, 
effect sizes were a mean .40 lower on dependent measures that represented 
global achievement tests than on probe-liye measures. 

These findings indicate that in order to promote the type of outcome 
special educators desire (i.e., global growth vs. mastery of discrete 
curriculum units), goal monitoring methods need to be selected carefully. 
Specifically, as practitioners develop programmatic or lEP goals and 
objectives and related curriculum-based monitoring procedures, both the 
curricular and content validity of their measurement procedures must be 
addressed. Curricular validity refers to the match between testing and lEP 
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goals and objtctiMts; content validity, tht corrtspondenct bttwttn ttsting 
and tht trut domain in which proficitncy is dtsirtd (Yalow & Pophan, 1?84>« 
Curricular and conttnt validity art addrtsstd simul tantouviy only Mhtn 
practi t iontrs writt "significant rathtr than trivial' lEP goals and 
objtctivts, which rtlatt wtll to tht trut dtsirtd outcont ptr^formanct 
(Popham tt a1.| 1985). Atttntion to this dual crittrion allows 
'*mtasurtnitnt-dr i vtn instruction' (Popham tt a1 • , 198*^), or ongoing 
asstssmtnt of pupil progrtss for instructional planning, to assumt an 
important tfftct on achitvtmtntt It implits that practitioners monitor 
progrtss toward long-ttrm goals, an approach that apptars to promott global 
tfftcts on studtnt achitvtmtnt. Practi t iontrs may wish to ust this strattgy 
to compltmtnt analysis of short-ttrm goal masttry, a systtm that, on tht 
othtr hand, can guidt instructional programming dtcisions mort d:rtct1yt 

Tht finding that long-ttrm goal monitoring rtlatts bttttr to global 
achitvtmtnt outcomt mtasurts may bt tsptcially important in tht tducation ''f 
rtmtdial and handicapped studtnts, who typically havt poorly dtvtloptd 
strattgits for maintaining and transftring skills (Andtrson-Inman , Ualktr, & 
Purctll, 1984| Uhitt, 1984). Short-ttrm goal mtasurtmtnt focusts on 
instruct ional ly-rtlattd, rtlativtly rtstricttd domains of mattrlal for a 
time ptriod and thtn, upon masttry of that mattrial, tht mtasurtmtnt and 
instructional focus simultaneously changes. Such a paradigm may be 
problematic for at least two reasons. Fi st, it may discourage teachers 
from reviewing material sufficiently to allow for long-term skill 
maintenance. Second, a close connection between instruction and measurement 
may encourage teachers to teach new skills to students within the framework 
of measurement talks. For example, if the measurement procedure requires 
the pupil to read consonant-vowel-cunso'^ant words frocd a list, the teachers 
may focus instruction on reading these words from listSt As noted by 
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Goodittin (1982), thtrt may bt dangtr in tying tht inttructional format too 
closfly to tht ai^tssmtnt dtvict or oi narrowly defining contt nt-x-format 
domains ot CTittrion^rtftrtncfd attttsmtnt. Such a rtttricttd instructional 
format may limit gtnt ral ization of skills. Amort global, long-ttrm 
approach to moni toring may tncouragt ttachtrs to incorporate instructional 
procedures that better promote skill maintenance and generalization. 

Goal ambi t iousness. A third issue relevant to the question of what 
to measure concerns goal &*nbi t iousness, or the mastery criteria toward which 
teachers and students strive. Inherent in the nature of special education 
goals is improvement of student growth rates. Nevertheless, a persistent 
and ubiquitous problem in goal specification is ambi t iousness: Given a 
current performance level and an academic yearns worth of special education 
instructional opportunity, how ambitiously ought teachers and lEP teams 
establish student expectations for improvement? 

In exploring this question, Fuchs, Fuchs, and Deno (1985) 
investigated the importance of goal ambitiousness and goal mastery to 
student achievement. Subjects were 58 mildly to moderately learning 
disabled) educable mentally retarded) and behavior disordered students, 
whose special education teachers assessed baseline performance and set 
reading goals according to a standard format. On the basis of the relation 
between baseline and the anticipated goal performance, students were 
assigned to goal ambitiousness groups. For 18 weeks, teachers implemented 
students^ goals. Then, end-of-treatment goal mastery was determined, and 
pre- atid posttest achievement scores were entered into analyses of 
covariance. These analyses revealed that goal ai*;)bi t iousness was associated 
positively with achievenent; goal mastery was not. 

Although this study was correlational rather than exper imental ^ 
results provide tentative evidence that when teachers establish moderately 
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to highly imbitious goals, studtnts achitMt bttttr, rtgardltss of whtthtr or 
not goals actually art attaintd* GiMtn tht movtmtnt in rtctnt dtcadts to 
creatt "schools without failure' (Olasstr, 1969), to providt "riskltss" 
special education (Hann, 1984), and to develop educational goals that insure 
goal attainment (cf. Clifford, 1984), the finding that goal imbi t iousness, 
not mastery, was associated with achievement im>< be unsettling. HoweMer, an 
optimal challenge, which increases teachers^ and students' persistence, task 
initiation, and task resumption, may lead to improved task performance 
(Clifford, 1984), and these constructive effects of striving may be 
facilitated by factors inherent in many ongoing monitoring methodologies 
(see Fuchs et a1 . , 1985). These factors include high and concrete goal 
avareness as well as unambiguous, easily available, and highly detailed 
assessment and evaluation information (Clifford, 1984). Therefore, it 
appears that relatively ambitious goals may be an important dimension as 
teachers determine procedures for monitoring student progress. 
Frequency of Measurement 

Criterion-referenced formative evaluation involves ongoing 
collection of student perforr<ance data. Yet, the precise frequency with 
which measurement occurs cart vary and must be determined by practitioners. 
Relevant considerations in making this determination are technical, 
practical, and effectiveness concerns (Deno, Mirkin, & Fuchs, 1992). Three 
important respective questions are: Uhat measurement frequency renders 
reliable, valid, and sensitive representations of student achievement?; Ulhat 
measurement frequency can be employed by a teacher without excessive time 
demands?; and Ulhat measurement frequency relates to improved student growth? 
Each of these considerations is explored briefly below. 

Technical considerations. Criterion-referenced measurement has 
received increasing attention over the years as an alternative to 
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traditional, global, normativt assei^imtnty btcauit crittr ion-rtftrtnctd 
mtasurts typically art isomorphic with rtsptct to instructional objtctivts 
and thtrtfort, havt ttrongtr curricular validity. Uhilt this strength is 
appealing to pract i t i oi^t ri, it fails to constitute sufficient grounds for 
psychometric adequacy. In fact, among published cr i ter icn-ref t^enced tests, 
there ii scant empirical support for technical strength. Inspection of 12 
commercial c*"! ter ion-referenced tests revealed that only four test manuals 
addressed reliability or validity at all, and authors of only two 
instruments investigated more than one aspect of test adequacy (Tindal et 
a1 . , 1983). Relatedly, empirical analyses of commercial 
criterion-referenced tests revealed varying indices of reliability and 
validity, with many estimates falling considerably below acceptable levels 
(Tindal et a1 . , 1985). Additionally, criterion-referenced ar^essment 
frequently requires educators to create their own testing materials and 
procedures; and given the time-consuming nature of reliability and validity 
studies, it is infeasible to investigate psychometric characteristics of 
every teacher-created test. 

Thus, the technical acceptability of criterion-referenced measures 
remains largely unknown. Nevertheless, measurement theory indicates a focus 
on the methodology of measurement, rather than on the content and format of 
each test, might result in acceptable reliabilities for criterion-referenced 
iests. If it were demonstrated that certain measurement methods tend to 
enhance the acceptability uf criterion-referenced measurement, then those 
methods could be employed with any test to improve reliability. 

Frequency of measurement has been identified as an aspect of 
criterion-referenced assessment methodology that affects reliability. Uhite 
(1971) established that in order to project a reliable performance trend, a 
minimum of seven data points was necessary. This finding indicates that to 
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insure an adequate data base on which to support decisions concerning the 
efficacy oi student programs and to avoid prolonged utilization oi 
inappropriate instructional strategies, practitioners should collect data 
frequently. 

Fuchs, Deno, and Marston (1983) investigated another aspect of 
technical adequacy related to measurement frequency. Borrowing from the 
measurement literature demonstrating that behavior averaged over occasions 
reduces measurement error (Epstein, 1980), they hypothesized that the 
stability of criterion-referenced measurement should improve as the number 
of observations over which estimates are aggregated increases. In a series 
of two experiments, they demonstrated that, for initially imprecise 
curriculum-based measures of academic proficiency, aggregating estimates of 
performance over as few as two occasions increased stability well within 
acceptable levels. Th&r^fore, these findings also support frequent data 
collection. 

Practical concerns. While technical issues seem to support daily 
measurement, practical considerations suggest a leaner measurement schedule. 
Evidence (Fuchs, Uesson, Tindal , Mirkin, & Deno, 1981) indicates that 
preparing for measurement, measuring, scoring performance, recording scores, 
and putting materials away can be time-consuming: Ten teachers, trained 
during a series of workshops, initially spent almost 13 minutes per 
measurement task; after considerable practice in the field, they devoted an 
average of approximately 2 minutes. These findings were replicated by Rieth 
and colleagues (see Rieth, 1982). Multiplied across a caseload of 15 
students, each of whom is measured on three curriculum tasks, thete results 
suggest that measurement potentially can occupy a significant amount of 
teacher time. Therefore, a measurement frequency of twice weekly rather 
than daily may represent a reasonable compromise: Teacher measurement time 
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is rtduced whilt technical proptrtits max bt maintaint d. 

Ef ^tct ivtntis concfpnti In furthtr support oi a twict wttkly 
mtasurtnifnt tchtdult, rtstarch indicatts that no additional btntfits accrut 
to student achitwemtnt as a function oi inert asing mt asurrat nt frtqutncy 
btyond twict wttkly. In a quantiiativt synthtsis oi rtltvant control ltd 
studies, L. Fuchs and 0. Fuchs (in prtss) found that tht awtragt tfftct 
sizfs associated with ntasuremtnt that occurs twice wttkly, thrtt timts 
weekly, and daily, respectively, were .85, .41, and .69, with no related 
statistically significant difference. 

Therefore, the available research base indicates th4t daily 
measurenient may generate the most technically adequate data base. However, 
the literatures addressing practical and effectiveness considerations, 
respectively, suggest that (a) practitioners' time constraints mitigate 
against daily measurement, and (b) student achievement effects associated 
with daily and twice weekly measurement are comparable. These findings 
suggest that practitioners may wish to collect student performance data 
twice weekly. 
Data Display 

For measurement systems in which time-series data are essential, 
such as ongoing monitoring of student progress and applied behavior 
analysis, agreement prevails that graphing is critical^ It assists in 
organizing data for formative evaluation^ provides a detailed numerical 
summary and visual description of performance, and facilitates communication 
of program results (Tawney & Cast, 1984). Moreover, available research on 
the effectiveness of ongoing monitoring systems indicates achievement is 
associated with graphed displays. When data are charted rather than simply 
recorded, achievement improves approximately .50 of a standard deviation 
unit (L. Fuchs & D. Fuchs, in press). 
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Despitf concurrtncf on tht importanct of graphing and tmpirical data 
to support its tfftctiMtntss, important difftrtncts txitt conctrning 
sptcific graphing conventions, tht most salitnt of which may bt tht typt of 
graphing paptr tmploytd. Somt programs advocatt tht um of ratio or 
logarithmically scaltd graph paptr (t.g., LIndslty, 1977; Uhltr & Haring, 
1980), whtrt tht ratt scalt is adjusttd to display proportional changts in 
studtnt bthavior. For txampit, tht distance from iO to 20 is identical to 
the change from 20 to 40 or from 40 to 80. In contrast, developers of other 
monitoring systems support the use of equal interval^ or conventional, 
graphing paper (e.g., Deno&HirKin, 1977). BelOM, the related controversy 
is reviewed. 

Logarithmically scaled paper has been described as technically 
superior by its proponents (see, for example, White & Haring, 1980), because 
the ratio scale is supposed to reflect the proportional way in which natural 
change occurs more accurately than equal interval paper. Additionally, 
logarithmic graphing pap^r has been described as more feasible than equal 
interval paper because, give.^ the large behavior range incorporated in one 
graph, a single chart can be u ved to display all relevant behaviors and can 
be used to make comparisons a/iong different behaviors (Uhite & Haring, 
1980). 

On the other hand, equal interval graphing may facilitate data 
analysis (Tawney & Bast, 19wn), and has been characterized frequently as 
easier for students and teachers to understand. Some propose that this 
understanding and relative ease in data analysis may, in turn, result in 
more consistent implementation of monitoring systems (Mirkin, Fuchs, & Deno, 
1982). Additionally, Harston (1982) explored the prediction capabilities of 
the two graphing methods and found that trend lines on equal interval paper 
predicted future performance more accurately ihan did trend lines drawn on 
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r»tio sctlfd paptr. This controvtrty surrounding rtlativt mtritt attociaitd 
with tht graphing mtthodt continuts. NtMt rtht 1fss< txctpt for tht Martton 
fttudy, no tmpirical cdntritt ttchnical o^ t«it;:htr and ttudtnt conctrns 
has betn idtntifitd. Thtrtfort, tht objtctivt ^«ta bast it inadtquatt to 
support tht ttchnical or logistical strtngths of tithtr graphing approach. 

Dfspitf c'^v'^'^ 'ing discussion of mhich typt of graphing mtthod is 
tfchnically and lo^istically suptrior, rtlativtly littlt atttntion has bttn 
dirtctfd toward invt stigation of which mtthcd is associattd with bttttr 
studtnt achicvtrntnt. In ont rtltvant idtntifttd rtport, Branstttttr and 
Mtrz (1978) conducttd a strits of two t xpt r imt nts, in which thty conpartd 
gains madt whilt charting scorts on lintar graphs with thost madt whilt 
simply rtcording raw scorts, and thtn conpartd gains associattd with 
charting scorts on ratio scaltd graphs with thost rtlattd to simplt 
rtcording. Unfortunattly , no atttmpt was ni£dt to contrast tht tf ftct ivtntss 
of graphing on lintar and ratio scaltd graphs. Furthtrmort, tht childrtn 
tmploytd in tht two studits wtrt ntithtr randonly assigntd nor similar to 
tach othtr, rtndtring it impossibit to draw valid comparisons bttwttn tht 
invtst igati ons. 

In an atttmpt to txplort tht question of how graphing mtthod 
contrlbutts to studtnt achitvtmtnt, Fuchs and Fuchs (198Sb> condUi..td a 
mtta-analysif of availabit controlltd studits on tht tfftctivtntss of 
ongoing monitoring systtms, coding studits by graphing convtntion and thtn 
computing and comparing tfftct sizts for studits in which tqual inttrval and 
ratio scaltd paptrs wtrt tmploytd. Rtsults indicattd that graphing mtthods 
did not product a statistically rtliabit difftrtnce in studtnt achitvtmtnt. 
Mortovtr, tht difftrtnct bttwttn tht mtan tfftct sizts of .2 standard 
deviation unit rtprtstnttd a difftrtnct of littlt practical importance 
(Cohtn, 1977). Thtrtfort^ studtnt achitvtQtnt tfftcts do not apptar to 
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provide a basis on which to stltct a graphing convention. 

In sum, currently available data support the use oi graphing, but 
fail to provide a basis for selecting a specific graphing convention. 
Little available information concerns technical propertiesi the data base on 
relative logistical strengths and weaknesses of the methods is •scant; and 
the literature on effects on student achievement reveals no reliable 
effects. Consequently, as practitioners design criterion-referenced 
formative evaluation procedure*; for monitoring their pupils^ attainment of 
goals, they might consider graphing 4S an essential procedural element, but 
rely on individual preferences and logistical considerations for deciding 
between equal interval and ratio scaled papirr for graphed displays. 
Data Utilization 

Although teachers may collect student performance data according to 
designated time schedules, they frequently fail to employ those data 
meaningfully to develop students^ educational programs (Baldwin, 1976; 
White, 1974)* For example, Tindal, Fuchs, Mirkin, Christenson, and Deno 
(1981) found that teachers often maintained instructional programs long 
periods despite that student performance data clearly indicated those 
programs were not producing student improvement. To complicate data 
utilization further, although teachers may analyze data correctly to 
recogn i ze when interventions are not effective, they experience difficulty 
generating substantively important modifications in their students^ programs 
(Fuchs, Dene, & Mirkin, 1982). 

Examples of data-utilization procedures are provided in the work of 
Hartng and Uhite and their colleagues as well as that of Deno and 
associates. Haring, Uhite, and Liberty (1979) deveioped a set of rules 
entitled "Experimental Data-Decision Rules with Minimal ^Celeration." These 
rules require practitioners to assess student performance in relation to an 
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timlinti which connects tht battline datt and Itvtl oi ptrformanct with the 
goal date and anticipated performance level. In her study oi the 
effectiness oi such rules, Martin (1980) adapted Haring et al.^t 
data-utilization strategy as follOiMs: (a) li a student'^ performance was on 
the ainline on one day and within five words of the aimline on the next day, 
the teacher progressed the student to the next curriculum step} (b) If a 
student^s correct performance was above the aimline for five consecutive 
days, the teacher drew a new aimline parallel to but above the original one; 
and (c> If a studtnt^s correct performance was below the aimline on two 
consecutive days, the teacher introduced a change in the instructional 
format, wherein data trends were used to determine the type of instructional 
problem (i.e., inappropriate instructional step or problems of compliance, 
fluency-building, acquisition, or format). The type of instructional 
problems then dictated general strategies concerning the nature of the 
i nstruc t i onal change . 

In her experiment, Martin compared effects on student achievement 
among groups that (a) collected but did not graph daily data, (b) graphed 
and employed the above rules to determine what and when to change 
instructional programs, and (c) graphed data and employed the 'when to 
change" rules without using the 'what to change* rules. Results indicated 
that posttest scores of the second and third t^^ups were significantly 
higher than those of the first group on certain measures, with no 
significant difference between the two data-utilization rule groups. 
Unfortunately, given that graphing, in and of itself, positively affects 
student achievement (L. Fuchs & D. Fuchs, in press), this study is difficult 
to interpret: It provides inadequate control for separating the effects of 
data-utilization from those of simply graphing. However, results do suggest 
that the use of general rules for specifying the nature of changes may not 
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rtsult in additional achitvtmtnt gains OMtr tht txttnt oi improMtmtnt 
associated with using 'whtn to changt' rults. 

Dtno and his colltaguts havt tmploy^d two contrasting 
data-utilization proctdurts. Ont strattgy, a "thtraptutic approach," is 
aimi ine-r^f ert ncfd, tht othtr oi which, an "txpt r imtntal " approach, is 
rtftrenctd to tht Itvtl oi ptrformanct in tht ifmtdiattly prtctding 
instructional phase. Uith tht thtraptutic approach, data inttrprt tat ion 
consists oi tht application oi tht following rult: li on 7 to 10 
consfcutivf data points, tht studtnt ptrformanct trtnd is btlow tht aimlint, 
thtn an instructional changt is introductd. With tht txptr imtntal approach, 
the trend, level, variability, and step change in students'' performance data 
are analyzed every 7 to 10 consecutive data points, and compared to the same 
indices calculated in the preceding phase. Uhen data analysis occurs in an 
experimental approach, program change is introduced: Data analysis does not 
determine whether change occurs, but rather facilitate!: determination of 
what to change. If the current program element is relatively ineffective, 
then it Is dropped and a new programmatic element is initiated; if 
relatively effective, then it is maintained, but a new program element 
nonetheless is introduced in an attempt to boost the p^rvormance level even 
further. 

Although studies comparing these data-utilization systems to each 
other as well as to those of Haring and his associates are scant and 
inconclusive, investigations .ocument the effectiveness of each ru1e*based 
approach separately (Mirkin, Deno, Tindal, & Kuehnle, 1980; Deno, 1985). 
Similarly, quantitative synthesis of controlled studies of various types of 
data-utilization systems fail to provide evidence for the differential 
effectiveness of one type of data-utilization method. Nevertheless, 
integration of findings indicates persuasive support for the effectiveness 

25 



Monitoring Progrtts-2S 

oi data-utilization ruitt in gtntral . L. Fucht and D. Fucht (in prttt) 
found that, on avtragt, tysttmatic format let aval uat ion that incorporattd 
tvaluatSon rults incrtattd ttudtnt achitvtmtnt approximattly .9 of a 
standard dtviation unit ovtr tysttmatic format ivt aval uat ion without such 
rults. Uith rults, tht mtan tfftct sizt Mas .91, indicating that tht upptr 
50y. of tht txptrimtntal group distribution, nhtrtin tvaluation rufts Mtrt 
tmploytd, txcttdtd approx imattty 829( of tht control group <no systtmatic 
formativt tvaluation) distribution. 

Conclusions and P#HntatiQn nf Rtmtargh Ouaattonft 
Avai labia rtstarch supports tht ust of ongoing cr ittrion-rtftrtnctd 
monitoring systtms to improMt tht instructional programs of miltfty handicapptd 
studtnts. Rtstarch also providts tht basis for ttvtral ttattrntnts on tht 
naturt of tfftctivt monitoring systtms ai^ong mildly handicapptd populations. 
Specifically, tht data bast supports tht ust of <a> ctrt^in tmpirically 
validattd critical bthaviors as tht focus of minitoring activity, <b) long-ttrm 
goal stattmtnts that may tncouragt ttachtrs to focus not only on tht immtdiatt 
instructional conttnt, but also on mainttnanct and gtntral ization of skills, 
and (c) rtlativtly ambitious goals that'.support task ptrsiittnct and striving. 
Tht littraturt also providts rationalt for ttachtrs to mtasurt studi^nt 
ptrformanct at Itast twict wttKly, to graph data using thtir prtftrrtd 
convention, and to tmploy data-utilization rults for dtttrmining nhtn, and 
ptrhaps how, to modify studtnts^ instructional programs. 

Ntvtrthtltss, tht samt littraturt Itavts many critical qutstions 
unanswtrtd and constitutts tht basis for dtlintating a rtstarch program in tht 
arta of monitoring tht progress of mildly handicai^ptd pupils. Among tht 
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issuti rtquiring idditional clarification and tmpirical txploratioa, ttutral 
conctrn ttachtrs^ apparent difficulty in implratnt ing monitoring txtttmt and in 
using data mtaningful 1y to dtutlop instructional programs (Baldwin, }976\ 
Ritth, 1982) Tindal tt a1., 1981) Uhitt, 1974). Rtlattd rtstarch qutstions 
includt: (a) Uhat anttctdtnt and constqutntial conditions (such as* typt of 
training, systtm Itvtl support, and professional fttdback) incrtast tht 
probability that ttachtrs will mtasurt and tvaluatt studtnt acadtmic 
ptrformancf according to designated schedules?) and <b) Does computer 
technology, designed to complement monitoring activities by facilitating data 
collection, storage, graphing, analysis, and evaluation (see Hasselbring, 1985) 
Hasselbring & Hamlett, 1983), affect the rate and accuracy with which 
monitoring procedures are employed and does it improve teachers' instructional 
behavior and/or pupil achievement? 

Additional questions concern dimensions of useful monitoring systems. 
Related research questions include the following! 

(a) Uhat is tht effect of goal ambi tiousness levels, contrasted wi thin a 
well-controlled experiment, on ieacher decisionmaking and on student 
achievement? 

(b) Uhat \rf the technical and practical effects and student achievement 
outcomes associated with the use of six-cycle and equal interval paper, 
when these graphing conventions are contrasted within an experimental 

study? 

(c) How does the use of data-util iz^tion rules affect teacher 

dec isionmaking? 

(d) Uhat are the differentia' effects of graphing data, usi^'q "i-*hen to 
change" data-utilization rules, and employing "how to change" 
data-utilization rules on student achievement and teachers^ 
instructional behavior? 
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(f) How do difftrtnt d4t4**uti1 izAtion rultSi such «s txptr iratntil ind 
thtriptutic dtciftionmiking, rtlitt to itudtnt tchifvtmtnt? 

<i) How cm fxptrt syttmt bt incorporittd with conputtr ttchnologx to 
ikc ' ^Atf tfichtrt^ df ttrminAtion oi how to modify ttudtntt' 
instructionil porgrart, ind whit irt tht tfftctt of using such txptrt 
ftysttms on studtnt ichifVfiifnt ind ttichtrt^ ptdigogicil bthivior? 

(g) Uhit ''^ thff ffffctt of conputtr printout griphtd ditpliyt on 
tfichtr dfci tionmiking ind ttudtnt ichitvtflitnt? 

Thfte rtprtfttnt i hindful of miny uttful qutstions thit providt 
inttrtsting ttrritory for wtll conctptuil iztd ind dttigntd invtstigit ion. 
In light of tMidtnct cltirly * idiciting the tfficicy of tnploying 
tytttmitic ongoing cr itfrion-^rtfertncfd monitoring tytttms, such continutd 
dfvtlopment ind rtttirch ipptirt to bt pottntiilly importint to tducitors 
of roildW hindicipptd ind rtmtdiil ttudtnts. 
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